Introduction

Wine tasting can range from a casual pastime to a lucrative profession. For professional sommeliers, considerable time and training is required to adequately rate wine quality. Intuitively, we might expect expert ratings to reflect the underlying chemical composition of the wines.

Thus, we wanted to analyze how accurately expert wine quality ratings can be predicted using a set of easily measured chemical components.

Data

Two datasets of expert quality ratings of red and white Vinho Verde wines were used. The data is obtained from http://archive.ics.uci.edu/ml/datasets/wine+quality. There are total of 1599 red wines and 4898 white wines in the two datasets.

The outcome variable is wine quality. This variable is an ordinal variable theoretically ranging from 0-10. However, the observed ratings only range from 3-9, where 0 indicates poor quality and 10 is for excellent quality. The data is highly unbalanced across quality classes.

Predictor variables are 11 physiochemical wine components: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol.

Below are the boxplots showing how each predictors are distributed across quality for red and white wine datasets.

Below plots show the correlation among the 11 variables for each red and white wine datasets, respectively. Here, the darker the blue, the more positively the variables are correlated and the darker the red, the more negatively the variables are correlated.

Methods

For the analysis, the red and white wine datasets were split into training and testing sets. The training data was sampled as 80% of the total available data and the remaining 20% was used as a testing data. We had training and test datasets for each red and white wine. The training data was used to train the model and the test data was used to evaluate the prediction accuracy of the model. Since the quality variable was highly unbalanced in the full dataset, the relative frequencies of this variable were preserved in both training and test data.

We fitted a random forest for machine learning method and three likelihood based models: linear regression, partial proportional odds model, and multinomial model to classify the quality ratings.

Linear Regression

Eventhough the quality variable is only ranging from 0 to 10 or, in the actual dataset, from 3 to 9, we fitted a linear model assuming the variable as a continuous variable. The linear model is as below \[Z = X^T \beta + \epsilon,\ \text{where}\ \epsilon\sim N(0,\sigma^2 I)\] where \(Z\) is the response variable, quality, \(X\) is the design matrix with predictor variables, \(\beta\) is the regression coefficient, and we assume the response follows normal distribution with the distribution assumption for \(\epsilon\). It is well known that the least squares estimate of \(\beta\) is \[\hat\beta = (X^T X)^{-1}X^T Z.\] Since the predicted value of a linear model is a continous variable, while our quality variable is integers, we rounded the predicted value to the nearest integer to get the final prediction of the quality rating from the linear model.

Partial Proportional Odds Models

Three different approaches were considered:

Multinomial Regression

Random Forest

For the Random Forest, we used all 11 physiochemical variables. For the likelihood based models we tried fitting a full model with all 11 variables hypothesizzing that we can find a reduced model. The reduced model was determined by looking at the correlations between predictors and OLS best subset. Thus, among the models selected using best subset variable selection method, we chose the model with no collinearity problem. The covariates for the reduced model are for red wine: volatile acidity, total sulfur dioxide, pH, alcohol, sulphates and for white wine: pH, volatile_acidity, residual_sugar, alcohol.

We compared the four models on the following metrics: accuracy, kappa, and weighted kappa. Accuracy is the proportion of correct classifications out of total classifications. Kappa is a commonly used statistic for capturing how well the classification is done compared to a 50% random chance classification. Weighted Kappa is an extension of Kappa. This is a more useful version of Kappa for data with inherent ordering since it penalizes misclassifications proportional to the distance from the true category. For instance, when the true quality rating is 4, a prediction of 7 will be penalized more severely than a prediction of 5. Since our outcome variable, quality has an ordering, we used the weighted Kappa to select the final model.

Results: Linear Model

Below is a visualization of the confusion matrix for red wine. We can see that the linear model tends to predict most of the quality towards the mean value since the model captures the population mean and also the data is highly unbalanced with most of the data concentrated around the 5 and 6 quality ratings.

This concentration towards the mean trend is more distinct with the white wine, where most of the prediction is 6.

Results: Partial Proportional Odds Model (White Wine)

Comparison of White Wine Proportional Odds Model Results
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Proportional Odds (F) 52.1472 0.2126 0.4053 0 0 45.3608 78.1321 19.8864 0 0
Proportional Odds (R) 51.7382 0.2108 0.3993 0 0 51.2027 74.9431 15.9091 0 0

Results: Proportional Odds Model (White Wine, Full)

Results: Partial Proportional Odds Model (Red Wine)

  • Proportional, partial proportional models converged
  • Results presented for partial proportional model with coefficient for total sulfur dioxide allowed to vary with the level of wine quality

Results: Comparison of Partial Proportional Odds Models (Red Wine)

Comparison of Red Wine Proportional Odds Model Results
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Proportional Odds (F) 58.9905 0.3232 0.5258 0 0 72.0588 59.8425 33.3333 0
Proportional Odds (R) 58.0442 0.3065 0.4707 0 0 72.0588 59.8425 25.6410 0
Partial Proportional Odds (F) 58.9905 0.3233 0.5162 0 0 72.0588 60.6299 30.7692 0
Partial Proportional Odds (R) 58.3596 0.3117 0.4742 0 0 72.0588 59.8425 28.2051 0

Results: Proportional Odds Model (Red Wine, Full)

Results: Partial Proportional Odds Model (Red Wine, Full)

Results: Multinomial Regression (Red Wine Quality Classification)

Comparison of Multinomial Regression Models for Red Wine Quality
Overall Results
Percent Correct by Category
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Full Model (Linear Terms) 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Reduced Model (Linear Terms) 58.3596 0.3193 0.4952 0 10 72.7941 56.6929 33.3333 0
Reduced Model (Second Order Terms) 55.2050 0.2702 0.4789 0 0 67.6471 55.1181 33.3333 0

Results: Multinomial Regression - Full Model Confusion Matrix (Red Wine)

Results: Multinomial Regression (White Wine Quality Classification)

Comparison of Multinomial Regression Models for White Wine Quality
Overall Results
Percent Correct by Category
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Full Model (Linear Terms) 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0 0
Reduced Model (Linear Terms) 51.9427 0.2201 0.4093 0 3.125 54.6392 72.6651 16.4773 0 0
Reduced Model (Second Order Terms) 53.2720 0.2396 0.4010 0 3.125 54.2955 74.4875 19.8864 0 0

Results: Multinomial Regression - Full Model Confusion Matrix (White Wine)

Results: Random Forest

Random Forest Results for Red and White Wine
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Red Wine 70.98 0.5263 0.6168 0 0 83.82 70.87 53.85 0.00 NA
White Wine 67.28 0.4862 0.6542 0 25 67.35 80.87 47.73 42.86 0

Results: Random Forest (Variable Importance)

Results: Random Forest (Red Wine)

Results: Random Forest (White Wine)

Comparison of Results: Red Wine

Comparison of Results for Red Wine
Overall Results
Percent Correct by Category
Model Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Random Forest 70.9800 0.5263 0.6168 0 0 83.8200 70.8700 53.8500 0
Proportional Odds 58.9905 0.3232 0.5258 0 0 72.0588 59.8425 33.3333 0
Multinomial 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Partial Proportional Odds 58.9905 0.3233 0.5162 0 0 72.0588 60.6299 30.7692 0
Linear Regression 57.7300 0.2998 0.4996 0 0 67.6500 64.5700 23.0800 0

Comparison of Results: White Wine

Comparison of Results for White Wine
Overall Results
Percent Correct by Category
Model Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Random Forest 67.2800 0.4862 0.6542 0 25.000 67.3500 80.8700 47.7300 42.86 0
Linear Regression 52.6100 0.2162 0.4211 0 0.000 39.8600 81.7800 22.1600 0.00 0
Multinomial 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0.00 0
Proportional Odds 52.1472 0.2126 0.4053 0 0.000 45.3608 78.1321 19.8864 0.00 0

Discussion: Random Forest

Using random forests, we predicted expert wine quality ratings fairly accurately. For the white wine, the prediction accuracy was 67.28% and the weighted Kappa was 0.6542. For the red wine, the prediction accuracy was 70.98% and the weighted Kappa was 0.6168. Using the proposed cutoffs from Landis and Koch, these weighted Kappa values suggest moderate to substantial agreement with the expert ratings.

Discussion: Likelihood Based Approaches

The prediction accuracies were 10-15% lower and weighted kappas were 0.1 to 0.2 lower compared to the Random Forest. However, among the likelihood based models, there is no clear winner. While linear model is easy to fit with the well-known beta estimate form and high speed, it showed a tendency to predict towards the mean value. The multinomial model had higher accuracy for predicting lower and higher quality wines. Surprisingly, accounting for the ordered nature of the ratings did not make a difference in predictions.

Discussion: Limitations and Future Directions

We speculate some misclassification was due to failure to measure other important chemicals since our dataset only included 11 variables. In addition to looking at the relationship between the whole predictors with the wine quality, it would be interesting to characterize how individual chemicals relate to wine quality. There were only a small number of truly excellent wines (rating of 8 or higher) making it particularly difficult to predict these ratings. It would be worthwhile to examine whether various outlier detection algorithms are more suited to characterizing these wines than the approaches we took.

Bottom Line

Expert wine quality ratings can be predicted reasonably well using chemical components, but true wine connoisseurs are still better off consulting a sommelier.

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74.